Evaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian

نویسندگان

  • Amir Hossein Jadidinejad
  • Fariborz Mahmoudi
  • Jon Dehdari
چکیده

Persian is a challenging language in the field of NLP. Rightto-left orthography, complex morphology, complicated grammatical rules, and different forms of letters make it an interesting language for NLP research. In this paper we measure the effectiveness of a simple and efficient stemming algorithm, Perstem, on Persian information retrieval. Our experiments on the Hamshahri corpus at CLEF2009 show that the Perstem algorithm greatly improved both precision (+91% ) and recall (+43% ).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ad Hoc Retrieval with the Persian Language

This paper describes our participation to the Persian ad hoc search during the CLEF 2009 evaluation campaign. In this task, we suggest using a light suffix-stripping algorithm for the Farsi (or Persian) language. The evaluations based on different probabilistic models demonstrated that our stemming approach performs better than a stemmer removing only the plural suffixes, or statistically bette...

متن کامل

A new hybrid stemming algorithm for Persian

Stemming has been an influential part in Information retrieval and search engines. There have been tremendous endeavours in making stemmer that are both efficient and accurate. Stemmers can have three method in stemming, Dictionary based stemmer, statistical-based stemmers, and rulebased stemmers. This paper aims at building a hybrid stemmer that uses both Dictionary based method and rule-based...

متن کامل

Query Wikification: Mining Structured Queries From Unstructured Information Needs using Wikipedia-based Semantic Analysis

Combining the language model and inference network, as implemented in the Indri search engine, is efficient and verified approach. In this retrieval model, the user’s information need is exhibited as Indri’s Structural Query Language. Although the SQL allows expert users to richly represent its information needs but unfortunately, the complicacy of SQLs make them unpopular in the WEB for ordina...

متن کامل

A New Method for Stemming in Persian Language Considering Exceptions

In this paper a new algorithm for stemming in Farsi language is presented. This stemmer is based on removing the suffixes and prefixes but a database is used to save the exceptions to decrease error rate. In the proposed method the speed of stemmer and also the percentage of errors are improved. The evaluation results on a small Farsi document collection show significant improvement in precisio...

متن کامل

A Bottom Up approach to Persian Stemming

Stemmers have many applications in natural language processing and some fields such as information retrieval. Many algorithms have been proposed for stemming. In this paper, we propose a new algorithm for Persian language. Our algorithm is a bottom up algorithm that is capable to reorganize without changing the implementation. Our experiments show that the proposed algorithm has a suitable resu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009